Amino acid pattern reveals multi-functionality of ORF3 protein from HEV

The smallest open reading frame (ORF) encoded protein ORF3 of hepatitis E virus (HEV), recently, has been demonstrated to perform multiple functions besides accessory roles. ORF3 could act as a target for vaccine against HEV infections. The IDR (intrinsically disordered region); IDP (ID protein)/IDPR (ID protein region), plays critical role in various regulatory functions of viruses. The dark proteome of HEV-ORF3 protein including its structure and function was systematically examined by computer predictors to explicate its role in viral pathogenesis and drug resistance beyond its functions as accessory viral protein. Amino acid distribution showed ORF3 enrichment with disorder-promoting residues (Ala, Pro, Ser, Gly) while deficiency in order-promoting residues (Asn, Ile, Phe, Tyr and Trp). Initial investigation revealed ORF3 as IDP (entirely disordered protein) or IDPR (proteins consisting of IDRs with structured globular domains). Structural examination revealed preponderance of disordered regions interpreting ORF3 as moderately/highly disordered protein. Further disorder predictors categorized ORF3 as highly disordered protein/IDP. Identified sites and associated-crucial molecular functions revealed ORF3 involvement in diverse biological processes, substantiating them as targets of regulation. As ORF3 functions are yet to completely explored, thus, data on its disorderness could help in elucidating its disorder related functions.


Background:
Hepatitis E virus (HEV), of the Hepeviridae family, is a major zoonotic pathogen causing acute hepatitis E worldwide [1].Recent data has roughly calculated that about 939 million of the world population has been already exposed with HEV infection (past experience) and about 15 -110 million individuals in the world are experiencing HEV infection (recently experiencing) [2].In India, the Hepatitis cases reported in India to the Central Bureau of Health Intelligence (CBHI) is exceedingly low, as most of the cases reach to traditional healers for the fact that there is no cure in allopathy as a common belief.Moreover, due to inadequate information, the exact number of HEV cases in our country has been unrecognizable.However, available reports have suggested that HEV is responsible for both acute hepatitis (10-40%) as well as liver failure (15-45%) in India [3, 4].Currently, HEV constitutes 8 genotypes (GTs) (GT I -GT VIII).The GTs (I and II) infect humans and majorly transmission occurs through spoiled or infected water and are cause acute hepatitis.The GTs (III and IV) constitutes an extended host range [5 -7] and cause chronic hepatitis (recipients with organ transplantation) [8,9].Some other HEV strains have been identified from specific hosts, for instance, GTs (V and VI) from wild boars [10,11] and GT VII and GT VIII from camels [12,13].Utilization of improperly cooked meat (from animal) products is one of the chief causes of sporadic cases in developed nations [14].The HEV expanding host range and newly discovered strains further complicates its implications on human health, its transmission and risk of infection [14].Also, blood-mediated [15] as well as person-to-person [16] transmission have been reported in addition to transmission from pet animals to humans [17,18].Due to all this, HEV has attained global attention and is recognized as a major health burden.Anti-HEV antibodies IgG and IgM, serve as markers for individuals who have experienced past HEV infection (persisting for various years) and person who has ongoing infection (persists for few months) respectively [19,20].The three well-defined open reading frames (ORFs), i.e., ORF1, ORF3 and ORF3 forms the genome of HEV [21].The largest reading frame ORF1 codes for several non-structural proteins that are required for the replication of HEV [22,23].The translation product of the structural reading frame ORF2 forms the virion major component, i.e., viral capsid [24,25], and the third reading frame ORF3 at 3' terminus codes for a protein that serves regulatory functions [26 -28].Here, current study has shown the analysis on unknown (in terms of structure) regions (i.e., a proteome's fraction which has no noticeable resemblance to some PDB structure) of the ORF3 protein of HEV.This fraction of proteome is considered as the 'dark proteome'.The dark proteomes include the complete proteome with particular emphasis on intrinsically disordered regions (IDRs), i.e., intrinsically disordered protein region (IDPR)/intrinsically disordered protein (IDP), that lack definite (three-dimensional) structures within viral proteomes [29].Studies have shown the correlation of viral disordered protein segments with its pathogenesis [30,31].In addition to this, reports have also documented the association of IDPs with several diseases' as they perform diverse roles in regulatory processes.Due to IDP's involvement in important biological processes, these are considered as potential drug targets [32 -35].Although, initially ORF3 was just considered a protein having accessory roles; but recently its functions have been associated to biogenesis of quasi-enveloped viral particles; cellular signalling and regulation of immune response and host tropism of HEV.Additionally, its potential to act as vaccine against HEV has also been documented [36,37].In this context, we conducted computational analysis of the HEV ORF3 proteins through analyzing its intrinsically disordered regions to gain advances in its function via disordered regions.The intrinsic disorderness in the HEV ORF3 was scrutinized using computational approach to envisage its disorder-related functions.The disorder analysis results predicted ORF3 protein highly disordered, which was found to be associated to several important molecular functions and biological processes like binding sites (such as, ion-, protein-, metal-binding), viral replication and RNA biosynthetic process), in addition to occurrence of post-translationally modified sites in its ©Biomedical Informatics (2024) Bioinformation 20(2): 121-135 (2024) 123 polypeptide chain.On summing up these observations, our study clearly indicated the ORF3 protein involvement in various significant processes as well as its interaction with the membrane of the host cell.The presented study can provide some novel insights into the understanding of ORF3 protein functions besides its accessory roles in HEV life cycle.

Materials and Methods: Sequence retrieval:
The sequences of HEV ORF3 protein were procured from GenBank, housed in NCBI (National Center for Biotechnology Information).The obtained sequences encompassed different GTs GT II, GT III, GT IV, GT V, GT VI, GT VII and GT VIII) and hosts (Human, Wild boar, Swine and Camel), as mentioned in Table 1.

Amino acid composition prediction:
The amino acid distribution pattern in HEV ORF3 was examined through an online server Expasy ProtParam [38].The tool ProtParam allows computation of various parameters for the entered protein sequence provided by a user.

Three dimensional (3D) structure analyses with disorder prediction:
The 3D models of HEV ORF3 protein were predicted using I-TASSER [39] webserver and analyzed.The ORF3 structures were constructed through I-TASSER using threading-based approach.Additionally, we measured the secondary structure content in the ORF3 models using Phyre2 (Protein Homology/AnalogY Recognition Engine) [40] webserver.Further, the occurrence of the intrinsic disorder within HEV ORF3 proteins was predicted using PONDR (Predictor of Natural Disordered Regions) [41], an online tool, at its default settings.The different versions of PONDR including VSL2, VL3 and VL-XT, were used to evaluate the intrinsic disorder status of the ORF3 proteins.

Potential disorder-based binding site prediction:
The disorder-based protein binding residues of the ORF3 proteins were identified using a combination of two webservers DISOPRED3 [42] and IUPred2A [43].The 0.5 was used as the cut off score for the disordered-protein binding residue prediction for both webservers, i.e., DISOPRED3 and IUPred2A.

Phosphorylation prediction:
The residues that can be phosphorylated, such as, Ser, Tyr and Thr, were identified within the ORF3 proteins of HEV using DEPP (Disorder enhanced phosphorylation prediction) online tool.

Structure-based function prediction:
The possible gene ontology based-function and process, using obtained HEV ORF3 3D modelled structures, was explored using COFACTOR algorithm [39].

Results:
The HEV genome encodes 3 well-defined ORFs, i.e., ORF1, ORF2 and ORF3.The ORF3 starts at 5131 st nucleotide position while terminates at 5475 th nucleotide position.The HEV genome diagrammatic illustration, according to the GenBank Accession ID: AF444002 is shown in Figure 1 [44].

Note: GT I (JF443720); GT II (M74506); GT III (AB222182); GT IV (GU119961); GT V (AB573435); GT VI (AB602441); GT VII (KJ496143); GT VIII (KX387865).
The evaluation of amino acid patterns in ORF3 polypeptide sequences was carried out to reveal distinctive features of the ORF3.The computed percentage of amino acids in ORF3 is stated in Table 1.Our analysis revealed that ORF3 polypeptides were deficient in most of the order-promoting residues which included Asn, Ile, Phe, Trp and Tyr, while showed normal fractions of Cys, however, the ORF3 proteins were richly endowed with order-promoting residues, such as, Leu and Val.On the contrary, abundance of most of the disorder-promoting residues, such as, Ala, Gly, Pro and Ser were observed in the ORF3 protein sequences, with normal percentage of Arg.In addition to this, the other disorder-promoting residues, like, Gln and Glu were observed in negligible amounts and Lys was found to be absent in the ORF3 protein's polypeptide (Figure 2).The major amino acids that contributed to the ORF3 polypeptide chains included Pro, Leu, Ser, Ala, Gly and Val, which clearly revealed the abundance of disorder-promoting residues (Pro, Ser, Gly and Ala) with limited number of orderpromoting residues (Leu and Val).It is noteworthy to mention that the most represented amino acid in ORF3 polypeptide chain was Pro which is a disorder-promoting amino acid (Figure 2).On summing up these observations, our initial analysis interpreted ORF3 proteins either as IDP (entirely disordered protein) or IDPR (proteins consisting of intrinsically disordered regions in combination with structured globular domains) [29].Therefore, in this regard, our composition analysis further prompted us to evaluate the disorder distribution in the ORF3 polypeptide chains through different bioinformatics predictors.

Disorder in ORF3 polypeptide chains: Quantifying disorder by calculating the predicted percentage of disordered residues
We classified the HEV ORF3 into; structured proteins, moderately disordered proteins and highly disordered proteins based on their overall fraction of predicted intrinsic disorder, i.e., <10% disorder, ≥10-<30% disorder and ≥30% disorder, respectively

(i) ORDPs (ordered proteins):
These proteins consist of disordered residues less than 30% in their polypeptide chains and are characterized by lack of disordered domain at either C-terminus or N-terminus (disordered segment of 30 or more consecutive amino acid residue); or in positions distinct from terminals N-and C (disordered segment of 40 or more consecutive amino acid residue).
(ii) IDPRs (structured proteins with IDRs): These proteins consist of disordered residues less than 30% in their polypeptide chains, however, they are characterized by atleast one disordered domain either at C-terminus or N-terminus (disordered segment of 30 or more consecutive amino acid residue); or in positions distinct from terminals N-and C (disordered segment of 40 or more consecutive amino acid residue).

(iii) IDPs (intrinsically disordered/unstructured proteins):
These proteins consist of disordered residues more than 30% in their polypeptide chains.

3D modelled structures with predicted disorder
Figure 2 provides 3D depictions of the ORF3 proteins, generated through I-TASSER, from various HEV viruses.The two major secondary structures in form of alpha-helices and beta strands in combination with disordered regions were identified in modelled ORF3 structures as summarized in Table 2 (Figure 3).The 3D structures showed the dominance of loops or coils as disordered segments are necessarily present within loops/coils in proteins [47].As mentioned in Table 2, the identified disorder percentage in generated ORF3 modelled structures clearly indicated the significant amount of intrinsic disorder in ORF3 proteins.The disorder prediction through Phyre2 modelled structures revealed ORF3 as moderately disordered proteins (≥10 -<30% disorder) or highly disordered proteins (≥30% disorder) on the basis of overall predicted intrinsic disorder fraction.Further, the analysis ruled out the probability of ORF3 protein categorization into highly ordered proteins as it was characterized with absence of less than 10% of the disordered segments in its polypeptide chain (highly ordered proteins PPID <10%).Therefore, the presence of significant fraction of disorder in ORF3 proteins, prompted us further to evaluate its disorderness using different PONDR algorithms, i.e., VSL2, VL3 and VL-XT.

Disorder analysis with PONDR-VLXT, PONDR-VL3 and PONDR-VSL2:
The predisposition for intrinsic disorder in HEV ORF3 proteins was evaluated using PONDR.Scores > 0.5 corresponded to disordered residues, wherein, different colours were used to depict the disordered regions in ORF3 proteins.The areas in purple are the predicted disordered protein regions by PONDR-VSL2, the regions marked with blue are disordered protein regions by PONDR-VL3 while the regions indicated with red were predicted to be disordered by PONDR-VLXT.
The predicted disorder patterns of ORF3 polypeptides, obtained from disorder predictors, are mentioned in Table 3.The disorder distribution profiles of the ORF3 proteins are shown in Figure 4A -H.
Additionally, presence of disordered domain in ORF3 polypeptide at the C-terminus, i.e., upto 48 to 73 consecutive amino acid residues, grouped it into IDP (as computed by all PONDR members).

ORF3 protein (AB222182):
The ORF3 polypeptide AB222182 was revealed as a highly disordered protein as it consisted of >30% of disordered residues (66.39% by VLXT and 88.52% by VSL2).Additionally, presence of disordered domain in ORF3 polypeptide at the Cterminus, i.e., upto 43 to 66 consecutive amino acid residues, grouped it into IDP (as computed by two PONDR members: VLXT and VSL2).

ORF3 protein (AB573435):
The ORF3 polypeptide AB573435 was revealed as a highly disordered protein as it consisted of >30% of disordered residues (75.89% by VLXT, 100.00% by VL3 and 91.07%by VSL2).Additionally, presence of disordered domain in ORF3 polypeptide at the C-terminus, i.e., upto 74 to 112 consecutive amino acid residues, grouped it into IDP (as computed by all PONDR members).

ORF3 protein (AB602441):
The ORF3 polypeptide AB602441 was revealed as a highly disordered protein as it consisted of >30% of disordered residues (70.54% by VLXT, 48.21% by VL3 and 85.71% by VSL2).Additionally, presence of disordered domain in ORF3 polypeptide at the C-terminus, i.e., upto 47 to 88 consecutive amino acid residues, grouped it into IDP (as computed by all PONDR members).

ORF3 protein (KJ496143):
The ORF3 polypeptide KJ496143 was revealed as a highly disordered protein as it consisted of >30% of disordered residues (55.75% by VLXT, 58.41% by VL3 and 58.41% by VSL2).Additionally, presence of disordered domain in ORF3 polypeptide at the C-terminus, i.e., upto 25 to 60 consecutive amino acid residues, grouped it into IDP (as computed by all PONDR members).

ORF3 protein (KX387865):
The ORF3 polypeptide KX387865 was revealed as a highly disordered protein as it consisted of >30% of disordered residues (70.54% by VLXT, 63.39% by VL3 and 59.82% by VSL2).Additionally, presence of disordered domain in ORF3 polypeptide at the C-terminus, i.e., upto 58 to 62 consecutive amino acid residues, grouped it into IDP (as computed by all PONDR members).

Categorizing ORF3 protein into disorder variant:
To make our findings more transparent, the results were combined (obtained from different disorder predictors) that revealed HEV ORF3 a highly disordered protein as the overall intrinsic disorder fraction was predicted to be ≥30% in the polypeptide) or IDP (as the predicted overall percentage of disordered residues was >30% in combination with disordered domain in the polypeptide) as mentioned in Table 3.Thus, huge content of intrinsic disorder in the HEV-ORF3 protein signified its interacting ability with other molecules by revealing its disorder-based binding tendency.Moreover, the presence of disordered domains at the C-terminus of ORF3 protein showed its propensity of binding to the ORF2 protein as well as the host components.As our intrinsic disorder propensity analysis is in line with the initial disorder prediction, thus, we further examined the protein-binding regions in the ORF3 proteins to make our findings more elaborative and consistent.

Potential disorder-based binding protein regions:
The disordered protein binding residues within disordered ORF3 protein sequences predicted by identified and are mentioned in the table (Table 4).The identified disordered protein binding residues using DISOPRED3 is shown in Figure 5. Thus, the identified protein-binding propensity analyses of the HEV-ORF3 are also in line with the initial disorder prediction as protein-binding sites (as predicted by DISOPRED3 and IUPred2A) were predicted towards both N-and C-terminus of the ORF3 protein sequences.

Prediction of gene ontology terms through COFACTOR algorithm:
The three top ranked molecular functions and biological processes based on 3D modelled ORF3 structures, generated through I-TASSER, are mentioned and described in Table 6.

Discussion:
The ORF3 protein has recently been linked to host immunity and signalling, host tropism and vaccine target [36, 37], henceforth, its targeting is ideal for devising treatment against HEV.In view of this, we performed a sequence-based analysis on the HEV ORF3 sequences to shed light into their intrinsic disorder prevalence by employing bioinformatics approach.This novel study reports the elucidation of ORF3 protein unstructured regions to shed lights on its implications in HEV regulation and pathogenesis.As disordered regions are rooted in the idiosyncrasies of their amino acid composition, we examined the amino acid composition of the ORF3 polypeptides in order to reveal its residue percentages.Investigations have revealed that IDRs (IDPRs/IDPs) possess a peculiar pattern of amino acid sequences, which differentiate them from ordered proteins [48 -51].As suggested in reports, the IDRs are enriched with disorder-promoting residues, such as, Ala (A), Arg (R), Gly (G), Gln (Q), Ser(S), Pro (P), Glu (E) and Lys (K), while are deficient in order-promoting residues, such as, Trp (W), Cys (C), Phe (F), Ile (I), Tyr (Y), Val (V), Leu (L) and Asn (N) [48 -51].It was also proposed that His (H), Met (M), Thr (T) and Asp (D) Furthermore, the identified processes, for instance, exocytosis, proteolysis, acute inflammation, transcription regulation and cell wall organization, further signified the critical role played by ORF3 in HEV regulation and pathogenesis.Altogether, the ORF3-associated molecular functions and biological processes clearly showed its involvement in HEV in multiple crucial roles [43].Importantly, IDPR/IDP has been associated with the regulation of as well as interaction with multiple unrelated partners due to its complex and heterogeneous structural organization, thus, constituting it as a multifunctional molecule [105].Thus, these observations further substantiate our findings.Altogether, our findings from the current study hypothesized ORF3 as a protein associated with multiple functions beyond its accessory roles in HEV.

Conclusions:
The study sheds novel light on the extent of intrinsic disorder distribution in the ORF3 protein of HEV.The sequences were utilized from the publicly available online database to perform comprehensive computational analysis of the ORF3 by analyzing the extent of occurrence of intrinsic disorder in HEV.The ORF3 protein sequences revealed abundance of signature disorder-promoting amino acid residues, which clearly indicated the ORF3 protein either as IDPR, i.e., protein consisting of intrinsically disordered regions in combination with structured globular domains or IDP, i.e., entirely disordered protein.Generated modelled ORF3 structures revealed the presence of significant fraction of disorder interpreting it as moderately disordered/highly disordered variant.Our predicted structural analysis was in accordance with initial amino acid compositional analysis which suggested ORF3 with significant percentage of IDRs.The prevalence of IDRs (IDPRs/IDPs) in ORF3 further urged us to evaluate its disorder status.The examination of disorder distribution (through different predictors) categorized ORF3 as IDP or highly disordered proteins, thus suggesting its involvement in various significant regulatory functions of viruses.It was observed that C-terminus had larger fraction of intrinsic disorder than the N-terminus.Additionally, the identified maximum number of protein-binding residues in the ORF3 protein sequences also showed propensity towards the Cterminus.The presence of post-translational modifications (like phosphorylation) in ORF3 protein further signified its involvement in various important mechanisms.Subsequently, identified structure-based gene ontology terms clearly revealed multiple functions associated with ORF3.Our study in near future may provide critical information on the unknown functions associated with the HEV-ORF3 protein.

Figure 1 :
Figure 1: Illustration depicting HEV genome.The genome is systematically organized into 3 ORFs, i.e., ORF1, ORF3 and ORF3.The nucleotide positions of the ORFs in HEV genome is with reference to Sar55 strain (having accession ID AF444002) [44].Evaluation of amino acid patterns:
[45].Further, we categorically grouped the ORF3 proteins into; ORDPs, IDPRs and IDPs based on the overall fraction of disordered residue and length of disordered domain [46].

Figure 5 :
Figure 5: Representation of disordered protein binding residues in HEV-ORF3.The disordered protein binding residues in ORF3 amino acid sequences are represented in green outlined boxes.The major secondary structure elements including alpha-helices and beta-sheets are also depicted.The analysis was conducted using PSIPRED.

Table 2 :
Secondary structure and disorder prediction in HEV-ORF3 proteins

Table 3 :
Intrinsic disorder score prediction in the HEV-ORF3 proteins.

Table 6 : GO term prediction for HEV-ORF3 modelled structure
Several functions including protein binding, DNA binding, flavin adenine dinucleotide binding, were predicted which clearly uncovered ORF's propensity to bind to several types of molecules, which have been previously reported in regulation [100].It is interesting to mention that the involvement of ORF3 in significant processes, such as, axon guidance [101], and in regulation of neuron apoptotic process [102].This revealed its role in neural development.Axon pathfinding or axon guidance refers to a process by which a neuron sends out axons to reach their correct targets.Study has demonstrated the role of the axon guidance signalling pathways in gene expression control [103].Neuronal apoptotic cell death regulation process plays a major role in shaping the nervous system development during embryogenesis [104].